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SYSTEM AND METHOD FOR IDENTIFYING 



WORD PATTERNS IN TEXT 



BACKGROUND OF THE INVENTION 



The Field of the Invention 



The present invention relates to systems and methods for identifying word patterns 
referenced in text, and more specifically, to identifying the word pattems substantially in real 
time. 



The Internet may be the most significant technological development of recent times. 
It allows inexpensive and almost instantaneous communication throughout the world. As 
more and more users begin to take advantage of the Internet, more resources are being 
directed to enhancing the ability of users to make use of information available on the 
Internet. 

Particularly, various tools that assist in speeding up Intemet transmission, searching 
the Web, and conducting research are continually being developed and distributed for 
Intemet users' benefit. One type of tool that has been developed and which may be used on 
information downloaded from the Intemet is a text parser. Much of the content available on 
the Intemet is in the form of text documents. Volumes and volumes of information are on 
the Web in text document format. To assist a user in more readily understanding contents 
of text documents, developers have provided document parser programs. 

Such programs typically receive a text document that a user wishes to have parsed 
and stores that text document persistently in static memory. The parser then makes continual 



The Relevant Art 
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passes over the text, combing it for identified words. Those words can then be identified and 
presented to the user, generally with some type of enhancement. Such enhancements may 
include a dictionary reference, a link to an identified web site, or the like. 

Such programs suffer from the drawbacks of being somewhat cumbersome and 
slow. They require significant processing resources, and accordingly are typically used only 
on powerful computers such as mainframes, work stations, servers, and the like. 
Additionally, the wait for a user while the text is parsed is generally considerable, because 
the multiple passes that are necessary for such parsing takes time and generally must be 
conducted remotely. This generally slows down research being conducted, and in general, 
lessens the Internet experience somewhat. 

Therefore, what is needed is a manner of identifying word patterns in text in a quick 
and efficient manner in order to improve research efforts and enhance the abilities of users 
to profitably use the Internet. 
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OBJECTS AND BRIEF SUMMARY OF THE INVENTION 

The system and method of the present invention have been developed in response 
to the present state of the art, and in particular, in response to the problems and needs in the 
art that have not yet been fiilly solved by currently available systems and methods. 
Accordingly, it is an overall object of the present invention to provide a system and method 
that overcomes many or all of the above-discussed shortcomings in the art. 

To achieve the foregoing object, and in accordance with the invention as embodied 
and broadly described herein in the preferred embodiment, an improved system and method 
for identifying word pattems in text is provided. In certain disclosed embodiments, the 
system for identifying objects referenced in a stream of text comprises an input pipeline 
configured to receive an incoming stream of text comprised of words; a text analysis module 
configured to consult a semantic network to automatically identify one or more word pattems 
in the incoming stream of text with a single examination of each word; and an object 
association module configured to reference a known object identified by a word pattern of 
the semantic network. 

The semantic network may be configured to be loaded substantially entirely into 
RAM memory of a processor, and the text analysis module may be configured to consult the 
semantic network within the RAM memory. Additionally, the input pipeline may be 
configured to divide the text. In certain disclosed embodiments the method comprises 
receiving an incoming stream of text comprised of words; consulting a semantic network to 
automatically identify one or more word pattems in the incoming stream of text with a single 
examination of each word; and referencing a known object identified by a word pattem of 
the semantic network. 

The method may also comprise loading the semantic network substantially entirely 
into RAM memory of a processor and the step of consulting the semantic network may be 
conducted by consulting the semantic network within the RAM memory. 
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The semantic network may be consulted in a hierarchical order moving from 
identified nodes to related nodes linked with the identified nodes. In one embodiment, the 
method examines words in the stream of text in a sequential order as the words are received 
and formats the stream of text to represent identified objects without persistently storing the 
stream of text. The method may also involve breaking the stream of text into individual 
words and analyzing each word in an order of occurrence of the word in the stream of text 
by comparing the individual words to identified words in the semantic network. 

In addition, the method may involve finding a match between an individual word 
in the stream of text and a word within the semantic network. Upon finding the match, the 
method compares the individual word and an adjacent word of the stream of text to a word 
pattern in the semantic network to find a word pattern involving the word. Additionally, 
words of the stream of text may be continually added to recognized word patterns and the 
result compared to other word patterns in the semantic network until no more word patterns 
containing the individual word are located. Links are preferably followed between the word 
patterns and recognized objects, and the identified known objects presented to a user. 

In one embodiment, the identified objects are presented to a user by providing links 
between identified word patterns in the stream of text and objects in a knowledge base to 
which the word patterns identify. The links may be provided in the form of URLs. 
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BRIEF DESCRIPTION OF THE DRAWINGS 

In order that the manner in which the advantages and objects of the invention are 
obtained will be readily understood, a more particular description of the invention briefly 
described above will be rendered by reference to specific embodiments thereof which are 
illustrated in the appended drawings. Understanding that these drawings depict only typical 
embodiments of the invention and are not therefore to be considered to be limiting of its 
scope, the invention will be described and explained with additional specificity and detail 
through the use of the accompanying drawings in which: 

Figure 1 is a schematic block diagram illustrating one embodiment of a system for 
identifying word pattems in text of the present invention. 

Figure 2 is a schematic block diagram illustrating one embodiment of an word 
pattern identification module of the present invention. 

Figure 3 is a schematic block diagram illustrating one embodiment of an object 
residing within a semantic network of the present invention. 

Figure 4 is schematic block diagram illustrating one example of a semantic network 
of the present invention. 

Figure 5 is a schematic flow chart diagram invention illustrating one embodiment 
of a method for generating a semantic network of the present invention of the present 
invention. 

Figure 6 is a schematic flow chart diagram illustrating one embodiment of a method 
of the present invention for identifying word pattems in text. 

Figure 7 is a schematic flow chart diagram illustrating one embodiment of a method 
of the present invention for consulting a semantic network to identify words, word pattems, 
and objects in text. 
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DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 

Shown in Figure 1 is a word pattern identification system 100 of the present 
invention. Within the system 100 are shown system components including a server 102, a 
client 104, and a knowledge base 106. Each of these components 102, 104, 106 preferably 
operates as or on a digital computer. The system components 102, 104, 106 may operate 
within a common computer, but in the preferred invention, operate as separate computers and 
are in communication with each other over one or more digital communication mediums 
such as a computer network 108. The computer network 108 may comprise any form of 
digital communication, including a local area network, a wide area network, or the Internet. 

Within the server 102 is depicted a static memory 110, such as a hard disk drive. 
Also shown are a bank of parallel processors 112, and an input pipeline 1 16. Within the 
static memory 110 is disposed a word pattem identification module 200, which will be 
explained in greater detail below with reference to Figure 2. The word pattem identification 
module 200 operates on one or more of the processors 112, typically within RAM memory 
114. While four processors 112 have been shown, a single processor 112 maybe used in one 
embodiment, while any number of multiple processors 112 may be employed in other 
embodiments. 

The input pipeline 1 16 is a mechanism for providing data to the processors 112 and 
for returning results from the processors 112. In one embodiment, the input pipeline 116 
may merely comprise a bus which is multiplexed between the processors 112. In alternate 
embodiments, the input pipeline comprises a separate processor 1 12 or other digital logic 
device for dividing tasks between the processors 112. Within the input pipeline 116, or 
otherwise communicating therewith, are a thread division module 115 and an assembly 
module 117. The thread division module 1 15 in one embodiment is used to divide tasks, 
such as text documents to be processed, into multiple threads for submission to the individual 
processors 1 12. The assembly module 1 17 in one embodiment is used for assembling the 
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results generated by the processors 112 and submitting the results back the client 104. 

The client 104 in one embodiment comprises a computer station from which a user 
may gain access to the Intemet. Examples include a personal computer (PC), a web terminal, 
a palm computing device, and the like. Within the client 104 is located some form of 
interface with the Intemet, such as a web browser. Also provided is a client-side module 
125, which allows the client 104 to receive services from the server 102. Within the web 
browser 120 is shown a web page 122 which has been downloaded from the Intemet, or 
which the user desires to download from the Intemet. Within the web page 122 are shown 
one or more text documents 124 and one or more imiform resource locators (URLs) 126. 

Within the client-side module 125 is shown an interface module 128 for interfacing 
with the server 102 and specifically with the word pattern identification module 200. Also 
shown are XML receipt module 130 for receiving and processing the results of a text analysis 
conducted by the word pattern identification module 200 and a knowledge base linking 
module 132 for linking results of the text analysis with the knowledge base 106. 

Shown within the knowledge base 1 06 is a database 1 34 containing objects 300 and 
tags 314 labeling those objects. The objects 300 may comprise persons, places, things, ideas, 
and the like. The tags 314 are various labels by which those objects may be known. Thus, 
the knowledge base 106 is much like an on-line encyclopedia, and indeed, in certain 
embodiments, may comprise an on-line encyclopedia. 

Referring now to Figure 2, shown therein is one embodiment of an word pattern 
identification module 200 of the present invention. Within the word pattern identification 
module 200 are shown a semantic network generation module 202, a text analysis module 
204, and a semantic network 400. One example of a semantic network 400 will be explained 
in greater detail below with respect to Figure 4. 

Within the semantic network generation module 202 are shown a knowledge base 
interface 206, an object parser 208, a semantic network interface 210, a word placement 
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module 212, a word pattern placement module 214, a node linking module 216, a 
hierarchical level assignment module 218, an object association module 220, and an object 
labeling module 222. In the depicted embodiment, the knowledge base interface 206 is 
configured to commxmicate with the knowledge base 106 of Figure 1 in order to receive 
identified objects 300 from the knowledge base 106. The objects 300 and tags 3 14 for these 
objects 300 are then used to generate (or populate) the semantic network 400. 

The object parser 208 receives the objects 300 and tags 314, together with other 
associated information, such as object type 424, and parses that information. For instance. 



the tags 314 are broken down into individual words and word patterns 412 comprising 
combinations of words 410 (of Figure 4) that make up the tags 314 that identify an object 
300. The semantic network interface 210 then generates the semantic network 400, or if it 
is already in existence, populates it with words 410, word patterns 412, and objects 300 
transmitted from the knowledge base 106 and parsed by the object parser 208. In one 
embodiment, the words 410 are located at an extreme hierarchical level as word nodes 41 1 
of the semantic network 400. 

The word placement module 212 locates words 410 within the tags 314 and passes 
the located words 410 to the semantic network interface 210 for placement in the semantic 
network. The word pattern placement module 214 takes the words 410 and builds those 
words 410 into word patterns 412. For instance, the object International Business Machines 
may be transmitted from the knowledge base 106. The words "International Business 
Machines" form the tag 314 of the object, which is preferably also transmitted. The tag 314 
is also a word pattern which can be broken down into the words intemational, business, and 
machines. These words 410 are then, in tum, recombined by the word pattern placement 
module 214 into word patterns 412 including, in this example, Intemational Business, and 
Intemational Business Machines. The abbreviation IBM may be an altemate tag 3 14 that is 
also placed as a word and/or word pattern into the semantic network to identify the object 
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International Business Machines. Each of the word patterns that can be formed from the 
various tags 314 of an object 300 are placed at nodes 413 within the semantic network 400. 

A node linking module 216 is preferably configured to interconnect the various 
nodes 411, 413, 421, 424, of the semantic network 400. For example, In the semantic 
network 400 of Figure 4, the object Mr. Don M. Simpson is shown. The object 300 with its 
tags 3 1 4, 3 1 5 in one embodiment is received from the knowledge base 1 06 and parsed by the 
object parser 208. The resultant words mr, don, m, and simpson 410, are placed into word 
nodes 41 1 by the word placement module 212. These words 410 are then combined into 
word patterns 412 by the word pattern placement module 214 and placed at word pattem 
nodes 413. The word patterns 412 are linked together by the node linking module 216. 
Also during this process, the hierarchical level assignment module 218 preferably assigns 
a hierarchical level to each of the nodes 410, 412, 300, 416. In one embodiment, the less 
words in the word or word pattem that inhabits the node, the lower the hierarchical level of 
the node. 

Returning to the example of Figure 4, the words 411 with mr, don, m, and simpson 
are assigned word identification numbers. The word pattems 41 3 are assigned word pattem 
identification numbers, which in the depicted embodiment comprise negative numbers. 
Objects 300 are similarly assigned object identification numbers. Object types 41 6 also have 
corresponding identification numbers. 

An object association module 220 is preferably configured to link 420 or otherwise 
associate an object 300 with the word pattems 412 devised from the tag 314 of that object 
300. For example, as depicted in Figure 4, the word pattem mr don m simpson 412 is linked 
420 with the object Mr. Don M. Simpson 300. The word pattem Intemational Business 
Machines which is referenced in this example from the word pattem ibm 412 is linked 420 
with the object Intemational Business Machines 300. 

One example of an object 300 as stored in the semantic network 400 is shown in 
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Figure 3. The object 300 of Figure 3 is depicted as having components associated together 
with the object 300, including an identification key 310, a knowledge base link 312, a tag 
314, alternate tags or aliases 315, a type 318, and a type identification number 319. Of 
course, much more information about the object may be stored within the semantic network 
400, but need not be, as the knowledge base link 312 allows for a connection to be made to 
the typically more complete data about an object within the knowledge base. 

The identification key 3 1 0 is preferably a code that is globally unique for each object 
400 within the semantic network 400, in order to uniquely identify each object 400. In some 
embodiments, the knowledge base link 312 may comprise a unique code from the knowledge 
base, and obviate the need for a separate identification key 310. The object type 3 1 8 is the 
type of the object, such as person, place, company, idea, and the like. The object type 318 
may be accompanied by or substituted for by an object type identification number 319 that 
is unique for each object type. Preferably, each object 300 is represented only once in the 
semantic network 400, though more than one word pattern 412 may point to an object 300. 

Retuming to Figure 2, an object labeling module 222 is used to link 422 the object 
300 with its object type 416, of which a selected finite set are preferably listed within the 
semantic network 400. For instance, in the semantic network 400 of Figure 4, two different 
object types 416 are Usted, people and public companies. Each object type 416 is preferably 
provided with its own unique type identification number, as depicted. Of course, while only 
two object types are depicted, it should be readily apparent that an infinite number of 
possibilities for obj ect types exist, and a typical semantic network may have numerous obj ect 
types 416 within it. 

The text analysis module 204 is in one embodiment used during operation of the 
word pattern identification module 200 to service requests from the client 104 of Figure 1 
to process a text document 124 and identify word pattems 412 and/or objects 300 within the 
text document 1 24. While the semantic network generation module 202 and the text analysis 
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module 204 are shown within a common module 200, the two could comprise separate 
software programs. In the depicted embodiment, however, the word pattern identification 
module 200 is a single software program configured to operate within the server 102, and to 
coordinate with a client module 125 which is preferably a separate software program. 

Shown within the text analysis module 204 of the depicted embodiment are a 
tokenizing module 224, a word buffer 226, a semantic network interface module 228, a word 
comparison module 230, a node comparison module 232, a word joiner module 234, an 
object location module 236, a thread coordination module 238, and a seam processing 
module 240. The text analysis module 204 preferably operates within one or more 
processors 112 of Figure 1, in conjunction with the semantic network 400, which is also 
preferably loaded into RAM 1 14 of each processor 1 12. The two communicate through a 
semantic network interface module 228. The text analysis module 204 and the semantic 
network 400 may be located within separate software programs, and indeed, the semantic 
network may be remote to the text analysis module, communicating over a network, modem, 
or the like. Nevertheless, in the depicted embodiment, the text analysis module 204 and the 
semantic network 400 operate within the same software program 200 on the server 102, and 
a version is loaded into RAM 1 14 of each parallel processor 112. 

Text documents 124 or threads of text documents 124 may be received by the 
tokenizing module 224 and broken or "tokenized" into separate words. Each tokenized word 
(and more preferably, a group of adjacent words) is received into the word buffer 226 as the 
words stream into the text analysis module 204. While the tokenized word is within the 
word buffer 226, it is examined by the word comparison module 230 to determine if the 
word can be foimd within the lexical database. In one embodiment, this comprises 
comparing the word, using the semantic network interface module 228, to words 410 within 
the semantic network 400. The word nodes 411 may be arranged within the semantic 
network 400 using a beneficial data structure. In one embodiment, for example, the words 
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are listed alphabetically within a binary tree. In another embodiment, sorting methods such 
as a bubble sort are used to find matching words 410. In a further example, a hashing 
algorithm is used. 

Preferably, the text document 124 is loaded entirely into the buffer 226. 
Nevertheless, other embodiments may be employed where only a portion of the text 
document is loaded into the buffer 226. Additionally, it is preferred that the buffer 226 is 
located in RAM memory 114. Thus, the text document is preferably never persistently 
stored. That is, it is never stored in static memory 110. This allows the word pattern and 
object identification to be conducted and results retumed in real time or near real time. 

When a word is found to be within the semantic network, the word pattem 
comparison module 232 compares the word for word patterns 412 of the semantic network 
400. In so doing, an adjacent word and preferably the next successive word in the text 
document 124 is added to the word by the word joiner module 234. The string of words is 
then compared to word patterns 412 of the semantic network 400. In one embodiment, this 
comprises following links 418 within the semantic network 400 to associated word pattems 
412 within word pattem nodes 413. So doing eliminates the need for examining every word 
pattem 412 in finding matches. This process is continued, following the links until no more 
word pattem matches are found. At this point, the object location module 236 is used to 
determine whether the located matching word pattems 412 correspond to obj ects 300 within 
the semantic network 400. 

In one embodiment, only the lowest level word pattem that corresponds to a tag of 
an object is used, and that object is retumed as the result. For instance, if the words 
intemational, business, and machines pass through the text analysis module 204, several of 
the corresponding word pattems may be linked to objects 300. For instance, intemational 
business may be an object with a type corresponding to a course of study, a discipline, and 
the like. The word pattem intemational, business, and machines may correspond to the 
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object International Business Machines, which is a public company. Both of these results 
may be returned in certain embodiments, but in one embodiment, only the object for the 
lowest level word pattem, in this example, Litemational Business Machines, is retumed. 

The objects 300 that are located from the stream of text 124 are compiled with an 
object com^lation module 242. The objects are assembled in order and may be placed in 
a text document, such as an XML document and then transmitted back to the client 1 04. The 
object compilation module may be periodically streamed back to the client 104, but in one 
embodiment is retumed to the assembly module 117 of Figure 1 . When parallel processing 
is used, the thread coordination module 238 is used to break the incoming stream of words 
from the text document 124 into multiple threads, and may perform the fiinction recited for 
the thread division module 1 15, or may coordinate with the thread division module 115. 

In one embodiment, the thread coordination module keeps track of where seams in 
the threads occur. The seam processing module 240 then assembles the words from adjacent 
threads at the seams of the thread and examines a selected number of words on both sides 
of the threads for words, word pattems, and objects in the manner described above. Located 
objects are then assembled by the assembly module 117, which may be a designated seam 
processing module 240 into the final XML or other type of document for submission to the 
client 104. 

Figure 5 illustrates one embodiment of a method 500 for generating a semantic 
network of the present invention. The method 500 is also one embodiment of a manner of 
use of the semantic network construction module 202 of Figure 2. The method of Figure 5 
starts 510, after which a semantic network generation system is provided 512. In one 
embodiment, the semantic network generation system comprises the semantic network 
generation module 202 of Figure 2, and may be provided together with the other components 
of the system 200 or independently. 

The method 500 also preferably comprises connecting 5 14 to a data source. In one 
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embodiment, this comprises connecting the server 1 02 of Figure 1 to the knowledge base 1 06 
in order to download objects 300 for submission into the semantic network 400. This may 
be conducted, as discussed, by the knowledge base interface 206 of Figure 1. 

Once connected to the data source, objects 300 are preferably received from the data 
source and parsed 516. This may be conducted, as discussed above, by the object parser 208. 
In one embodiment, the tag 314 of the object 300 is parsed into words 410. The parsed 
words are placed 518 into a word index, which in the depiction of Figure 4, corresponds to 
nodes 410 at an extreme (e.g., top) level of the semantic network. Of course, no action need 
be taken on words that are already in the word index when encountered. The placement of 
the words into the word index may be conducted by the word placement module 212 of 
Figure 2. 

Word patterns 412 are then generated from the words 410 that make up the tag 314 
and are placed 520 in nodes 413 within the semantic network 400. Thus, for every object 
300 and corresponding tag, 314, a word node 41 1 is formed for every word in the tag 314, 
and word patterns 412 are formed for every successive combination of those words 410. 
This may be conducted by the word pattern placement module 214. The generated nodes are 
then linked together 522. The nodes may be linked 522 by the node linking module 216. 
In one embodiment, each combination of words 410 is linked 522 to the word pattern 412 
formed. Word patterns 412 and words 410 combined to make lower level word patterns 412 
are linked 418 to the lower level word patterns 412. Additionally, word pattems 412 that 
comprise object tags 3 14 are linked 420 to the objects 300 identified. Each object 300 may 
also be linked 422 to an object type 416 and to any other desired identifying information. 

Word pattems 412 that correspond to objects 300 are then associated 526 with the 
respective objects 300 identified by the word pattems 412. This may be conducted by the 
object association module 220. 

Objects 300 may then be associated 528 with object types 416. This may be 
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conducted by the object labeling module 222. The method 500 continues this process as 
objects are being downloaded from the data source, testing 530 to see if more objects are 
available. If so, the method 500 loops back and receives and parses the received objects 300 
and continues on through the successive operations. If more objects 300 are not available, 
the method 500 ends 532. 

Figure 6 is a flow chart diagram illustrating one embodiment of a method 600 for 
locating word pattems and objects in a stream of text. The method 600 starts 610 and 
provides 612 a text analysis module and client software. In one embodiment, these comprise 
the text analysis module 204 of Figure 2 and the client side module 125 of Figure 1. A 
semantic network is also preferably provided 614. In one embodiment, a semantic network 
is provided in the manner described above with reference to Figure 5. 

Under the method 600, a client requests 616 a text analysis service in which word 
pattems and/or objects are to be located in a body of text. The body of text preferably 
comprises a text document 124, and is preferably received by the method 600 as a stream of 
text. In one embodiment, a user of the client 104 utilizes a web browser 120 and requests 
a web page 122. The user also requests that the text 124 of the web page 122 be analyzed 
for identified objects 300. The request may be conducted in any manner, including togghng 
a the service off and on in a preference file of the web browser 120. The cHent may then 
make the request merely by clicking on a URL identifying the web page 122. 

The web browser 1 20 then initiates a request over the Intemet for the web page 122. 
The request is intercepted by the client side module 125, which specifies in the request that 
the web page 124 is to first be sent to the server 102. The server 102 then receives the text 
document 124 and processes the text document, locating identified objects 300 referenced 
in the text 124 of the web page 122. Of course, other suitable manners of submitting a 
request 616 from a client 1 04 to a text processing program to locate obj ects in a block of text 
may also be employed. 
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Under the method 600, the requested text 124 is transmitted 618 to the text 
processing program. In one embodiment, this comprises streaming the text documents 124 
of a web page 122 or other designated text to the server 102. The text may be received into 
the input pipehne 116. 

The text may then be broken 622 into a plurality of threads, if the server 102 has 
multiprocessing capabihties. The threads are submitted to the individual processors, and the 
word pattern identification software on the processors receives the threads, which are 
processed simultaneously. Preferably, a copy of the semantic network is loaded into RAM 
114 of each participating processor 112. The method 600 is thus scalable, as more 
processors can be provided for larger text documents 124. 

Within each processor 112, the text is tokenized 624 into words as it streams into 
the processor 112. This may be conducted by a tokenizing module 224. The tokenized 
words are then compared to the contents of the semantic network to see whether the words 
are present therein. In one embodiment, this comprises placing the words in the word buffer 
226 and using the word comparison module 230 to determine if a match for the word is 
found among the word nodes 411. If so, the word pattem comparison module 232 and word 
joiner module 234 are used to find word patterns 412. One manner of doing this is illustrated 
in Figure 7. 

Referring to Figure 7, a method 700 is shown therein that starts 710 and then 
receives 712 a word to be analyzed. The word is preferably received into the word buffer 
226. The word is then compared 714 to the lexical database. In one embodiment, this 
comprises the word comparison module 230 comparing the word against the word nodes 411 
of the semantic network 400. This is preferably conducted quickly, and allows the text 124 
to stream in with results being transmitted in response in real time due to the fact that the 
semantic network is preferably stored in its entirety within a common RAM memory. The 
semantic network may also utilize other data structures, as discussed, for quick searching of 
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the words 410 for matches. 

The method 700 thus checks 716 for a match to each word streamed in. If a match 
is not found, the method 700 loops back and receives the next inputted word. If a match is 
found, the next successive word is added to the original word and the resulting string is 
compared to word pattern nodes 413 connected 418 to the node 41 1 at which the match was 
found. 

If a matching word pattern is found 720, the method 700 loops back and adds 718 
the next successive word and checks 720 for a matching word pattern. This process 
continues until no match is foimd. The Unks 420 are then followed 722 to located objects 
300. As discussed, this may be conducted for all located word patterns 412 leading to an 
object, or only for the most qualified word pattems 412. The located objects 300 are then 
provided as output. This may comprise listing the objects 300 on an output document or data 
file, as indicated at a step 724. 

The method 700 is continued while a test 726 for more submitted words is positive. 
When no more words are submitted, the method 700 ends 728. 

Returning to Figure 6, once all objects 300 in each thread are located, the objects at 
the seams of the threads are identified 628. This may be conducted by the seam processing 
module 240 of Figure 2, in the manner discussed above. Of course, if only a single thread 
is used, this step may be omitted. 

Identified objects 300 are preferably compiled 630 into a file for transmission back 
to the client. In one embodiment, this comprises compiling the text 124 together with 
notations of the located objects 300 where they occur in the text into a document such as an 
XML document. This document is then transmitted back to the client and presented 632 to 
a user. In one embodiment, due to the efficiency of the system and method of the present 
invention, the results are presented in real time or in near real time. 

In one embodiment, the results are presented 632 by turning the identified word 
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• # 

patterns 412 that comprise the tag 3 14 of an object 300 into a URL in the results document, 
that as discussed, may comprise an XML document. These links may be followed to more 
information about the objects 300. For instance, the links 126 may lead to listings of the 
object or directions to the listings of the object 300 in the knowledge base 106 or other 
database. By clicking on the link, a user can bring up a file on the object 300. For instance, 
a window may pop up with a file fi"om the knowledge base regarding the object 300. The 
file may be a dictionary definition, further links, or other information regarding the object 
that may be usefiil to the user. 

In a fiirther embodiment, rather than provide the text document 124 fi-om the 
Internet, the text document 124 may be provided fi"om a voice recognition program. In such 
an embodiment, the identified objects could, for instance, comprise commands to the voice 
recognition program. 

The present invention is claimed and described herein in terms of "modules." As 
used herein, modules may refer to various subsystems of the invention. For example, where 
the invention is implemented in software form, a module refers to hues of software code 
instructions. Modules also refers to electronic hardware configured to achieve the given 
purpose of the module. As such, a module is a structural element. 

As will be readily understood to one skilled in the art of software development, more 
than one instruction may exist within a module. The instructions may not necessarily be 
located contiguously, and could be spread out among various different portions of one or 
more software programs, including within different objects, routines, fimctions, and the like. 
Similarly, the hardware components of a subsystem or module, such as integrated circuits, 
logic gates, discrete devices, and the like, need not be organized into a single circuit, but 
could be distributed among one or more circuits. Unless stated otherwise, hardware or 
software implementations may be used interchangeably to achieve the structure and fimction 
of the disclosed modules. 
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The present invention may be embodied in other specific forms without departing 
from its spirit or essential characteristics. The described embodiments are to be considered 
in all respects only as illustrative and not restrictive. The scope of the invention is, therefore, 
indicated by the appended claims rather than by the foregoing description. All changes 
which come within the meaning and range of equivalency of the claims are to be embraced 
within their scope. 

What is claimed is: 
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